Structure
Volume Number: 1
Issue Number: 8
Column Tag: Special Projects
"The Structure of a Microsoft BASIC Program
By Mike Steiner, MacTutor Contributing Editor
Unraveling the Mysteries
During a fit of boredom and out of curiosity, I started peeking with fEdit at
Microsft BASIC to see how it was formatted when saved in compressed mode.
I found that if a character with the high bit set is not preceded by a quote, REM
(or a single quote) or DATA, then the BASIC interpreter considers it a keyword, part of
the coding of a number, or a syntax error. I received some help from Michael M. Boy
of Elgin AZ, who wrote a utility that locates itself in memory (Listing 1). (Programs
as presented here are for a 512K Macintosh. Change values as needed for a 128K Mac.
Some experimentation may be necessary to find the correct values. When you have
found the starting point, you can make the proper adjustments in listings 2 and 3.)
With this utility, we determined that a BASIC program usually starts at location
77002 (decimal) on a 512K Macintosh; however, on some occassions programs loaded
four to six bytes lower in memory. Apparently, this situation happens when another
application is run before BASIC is loaded. If the computer is reset, programs load at
77002. Mike then wrote a routine, which I modified (See listing 2), that prints a hex
and ASCII dump of itself. The routine peeks memory from the start of the program
until it finds the end of program marker. The memory dump is identical to that stored
on disk with the exception that one byte is prefixed to the disk file to show the nature
of the program (compressed or protected) and whether it was written in the Binary or
Decimal version of BASIC. This, however, is part of another column. Following is a
discussion of the format of Microsoft BASIC programs.
Format of Basic Program Lines
If a program line has n bytes, the line format is as follows:
Bytes 1 and 2: If the line is not numbered, the first byte of the line has the high
bit cleared. If the line is numbered, then this bit is set. The first two bytes (high
order byte first) show the length of the line; however only the second digit of the first
byte is used. The maximum number of bytes in the line is normally 255. However, if
there are colons or REM statements automatically inserted by BASIC (see below for a
discussion of tokens automatically inserted by BASIC), the maximum number may
exceed 255; the longest line I have seen had 259 bytes. The line length includes all
bytes in the line, including those used internally by BASIC and not displayed in the
program listing, such as the end of line marker.
The third byte is always $00.
Bytes 4 and 5: If the line is numbered, these bytes show the line number, high
byte first; the highest line number is 65529. If there is no line number the body of
the line starts with byte 4.
Bytes 6 (4 if no line number) through n-1: This is the information you typed in
the line.
Byte n: Always $00 to show end of line. This value ($00) may appear within a
line, but if it is not at position n, which is coded by the first two bytes, the program
recognizes that it is not the end of line marker.
A blank line is represented by “00 04 00 00.” This includes the end of line
marker. A blank numbered line is shown by “80 06 00 HB LB 00” where HB and LB
are the high and low bytes of the line number.
The end of program marker is “00 00 00 00 00” including the end of line
marker for the last line of the program. These five zeroes clearly describes the end of
the program when the first byte of the sequence is byte n. This sequence may also
appear in the body of a line as part of the coding of a declared double precision number.
Data Format Within a Line
All text within quotes or following a REM or DATA statement are represented in
positive ASCII (i.e. high bit off). However, those characters that are typed in
conjunction with the Option key (e.g. “π” “÷” etc.) use negative ASCII (i.e. high bit
set). Numbers are coded in positive and negative ASCII. The formatting of numbers is
quite complex and is beyond the scope of this column.
Reserved words are represented by negative ASCII. There are only 128 negative
ASCII bytes possible, and there are over 200 reserved words; therefore some reserved
words are represented by pairs of bytes (both in negative ASCII). (See tables 1 and
2.) Any byte with high bit set that is not defined as a reserved word or part of the
coding for a number and is not part of a PRINT statement, a REM, or a DATA statement
is not displayed in the listing and will cause an error message when program execution
reaches it.
There are a few special cases: REM, ELSE, GOTO, and GOSUB.
Special Cases
REM: Microsoft BASIC lets you use the apostrophe character as an abbreviation
for REM. If you do, it inserts a $3A (colon) and an $AF (REM) before the apostrophe
($E8) token, so what is actually represented is “:REM'” When BASIC sees these three
bytes, it suppresses listing the “:REM” ($3A $AF) in the list window. So, you use one
extra bytes of memory whenever you use an apostrophe instead of REM at the beginning
of a line (If you use REM, you need to put a space after it; with the apostrophe you do
not.) Using it within the line does not use any extra bytes because if you type REM
there, you have to precede it with a colon.
ELSE: Similarly, if you type ELSE in an IF - THEN statement, BASIC precedes it
with a non-printing colon if you do not type one. You do not use any extra bytes in this
case because your only other option is to type the colon yourself. You decide whether
the colon is visible in the program listing by typing it, or not visible by letting BASIC
insert it.
GOTO ($97) and GOSUB ($96) are followed by “20 1B 00 00 00” and the label
name, if going to a labeled line. If going to a numbered line, the token is followed by
“20 0E 00” and the line number, which is represented by two bytes, high byte first.
Managing Memory
From the above information, we can see that if available memory is a constraint,
you are better off using line numbers rather than line labels in your programs. Line
numbers use only two bytes in the line whereas a label uses one byte for each
character in the label plus one more for the mandatory colon. Further, each reference
to a labeled line elsewhere in the program uses five bytes plus the length of the label,
whereas a reference to a numbered line always uses exactly five bytes. Of course, if
the line is not referenced anywhere in the program, neither a label nor a line number
is needed.
Description of the Goodies
Listing 1 is the locator program that finds itself in memory by searching for the
REM token in the first line.
Listing 2 is the poke program that will poke the token of your choice into
memory to replace a REM statement, thereby self-modifying the program. This is
great for getting mathematical input and then executing it to return the value of an
inputted function. This same technique was used several years ago on the Apple II by
several companies to produce plot packages that could take an inputted function string
and plot the results. With this utility, you can accomplish this same technique on the
Macintosh.
Listing 3 is a program fragment that will do a memory dump of your program in
hex and ascii.
Tables 1 is a listing of the reserved words in Microsoft Basic, sorted by ASCII
code. Use this table with the poke utility to convert remark statments into new BASIC
code dynamically.
Basic Listing #1: Locator Program
REM }|{here
x$ = "}|{here" : REM x$ must be the same as the REM on the above line
y$ = LEFT$(x$,1)
x = 42000! : REM start searching here, should be suitable for 128K
Mac at this location
FOR i = x TO 512*1024
z$ = CHR$(PEEK(i))
IF z$ <> y$ THEN elp1
a$ =
FOR j = 1 TO LEN(x$)-1
a$ = a$ + CHR$(PEEK(i+j))
NEXT j
IF a$ = RIGHT$(x$,LEN(x$)-1) THEN PRINT "we got it at "; i : END
elp1:
IF i = x THEN PRINT "now at ";x : x = i + 1000
NEXT i
REM This program does not give the start of the program. It gives
the location where the first character in the REM statement begins.
Start of program is lower in memory.
Basic Listing #2: Poke Token in Memory
SUB printit STATIC
SHARED b
PRINT | (b) :REM the vertical bar is a place holder for the value to
be poked and is replaced with the token for the function to executed
by POKE 77031. Run the program and list it again. The vertical bar
will be replaced by the function you selected. DO NOT INSERT ANY
TEXT BEFORE THE VERTICAL BAR or the program will not work. The bar,
however, may be replaced by any character.
END SUB
OPTION BASE 1
DIM funct (5),funct$(5)
DATA 130, 160, 181,183, 186, ATN, COS, SIN, SQR, TAN
FOR i = 1 TO 5: READ funct (i): NEXT
FOR i = 1 TO 5: READ funct$(i):NEXT
CLS
PRINT"Enter Function you want evaluated
PRINT
PRINT" 1) ATN 2) COS 3) SIN
PRINT" 4) SQR 5) TAN
getfunction: INPUT "Your choice > ",a: IF a<1 OR a>5 THEN
getfunction
INPUT "Enter value to be processed > ",b
POKE 77031!,funct (a):REM Poke the token into memory
PRINT: PRINT
PRINT "The "; funct$(a); " of "; b; "is ";